{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Data Visualization with Python\n",
    "\n",
    "Authors: Prof. med. Thomas Ganslandt <Thomas.Ganslandt@medma.uni-heidelberg.de> <br>\n",
    "and Kim Hee <HeeEun.Kim@medma.uni-heidelberg.de> <br>\n",
    "\n",
    "Heinrich-Lanz-Center for Digital Health (HLZ) of the Medical Faculty Mannheim <br>\n",
    "Heidelberg University\n",
    "\n",
    "This is a part of a tutorial prepared for TMF summer school on 03.07.2019"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Prerequisite: Access the MIMIC-III Dataset\n",
    "The MIMIC (Medical Information Mart for Intensive Care) is a freely accessible database containing Intensive Care Unit (ICU) patients. The demo dataset is limited to 100 patients and publicly available as CSV files or as a single Postgres database backup file\n",
    "\n",
    ">**Instruction to access the MIMIC demo dataset:**\n",
    "<font size=\"3\">\n",
    ">1. Create an account on PhysioNet using the following link: https://physionet.org/register/\n",
    ">2. Navigate to the project page: https://physionet.org/content/mimiciii-demo/\n",
    ">3. Read the Data Use Agreement and click “I agree” to access the data\n",
    "</font>\n",
    "\n",
    "## Prerequisite: MIMIC-III files locally\n",
    "\n",
    "You should place the following MIMIC-III data files in the `data/` subfolder:\n",
    "\n",
    "* ADMISSIONS.csv\n",
    "* PATIENTS.csv\n",
    "* CPTEVENTS.csv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "<img src=\"images/er_mimic.png\" style=\"width:100%\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "* Database description: https://mimic.physionet.org/gettingstarted/overview/\n",
    "* Table description: https://mimic.physionet.org/mimictables/admissions/\n",
    "* ER-Diagram: https://mit-lcp.github.io/mimic-schema-spy/relationships.html"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "**Agenda**\n",
    "* <b>Pandas</b>\n",
    "* </b>Pandas-Profiling</b>\n",
    "* </b>Missingno</b>\n",
    "* </b>Wordcloud</b>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Pandas\n",
    "http://pandas.pydata.org/pandas-docs/stable/reference/ <br>\n",
    "Pandas is a Python library for exploring, processing, and model data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Pandas supports charting a tabular dataset\n",
    "DataFrame.plot([x, y], **kind**)\n",
    "> **kind** :\n",
    "* 'line': line plot (default)\n",
    "* 'bar': vertical bar plot\n",
    "* 'barh': horizontal bar plot\n",
    "* 'hist': histogram\n",
    "* 'box': boxplot\n",
    "* 'kde': Kernel Density Estimation plot\n",
    "* 'density': same as 'kde'\n",
    "* 'area': stacked area plot\n",
    "* 'pie': pie plot\n",
    "* 'scatter': scatter plot\n",
    "* 'hexbin': Hexagonal binning plot \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "<img src=\"images/DataVisualisation.jpeg\" style=\"height: 1000px;\"/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Visualize the admission table"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "pd.set_option('display.max_columns', 999)\n",
    "import pandas.io.sql as psql\n",
    "# plot a figure directly on Notebook\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "a = pd.read_csv(\"data/ADMISSIONS.csv\")\n",
    "a.columns = map(str.lower, a.columns)\n",
    "a.groupby(['marital_status']).count()['row_id'].plot(kind='pie')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "a.groupby(['religion']).count()['row_id'].plot(kind = 'barh') "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "p = pd.read_csv(\"data/PATIENTS.csv\")\n",
    "p.columns = map(str.lower, p.columns)\n",
    "ap = pd.merge(a, p, on = 'subject_id' , how = 'inner')\n",
    "ap.groupby(['religion','gender']).size().unstack().plot(kind=\"barh\", stacked=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "c = pd.read_csv(\"data/CPTEVENTS.csv\")\n",
    "c.columns = map(str.lower, c.columns)\n",
    "ac = pd.merge(a, c, on = 'hadm_id' , how = 'inner')\n",
    "ac.groupby(['discharge_location','sectionheader']).size().unstack().plot(kind=\"barh\", stacked=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "**Agenda**\n",
    "* </b>Pandas</b>\n",
    "* <b>Pandas-Profiling</b>\n",
    "* </b>Missingno</b>\n",
    "* </b>Wordcloud</b>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Pandas-Profiling\n",
    "https://github.com/pandas-profiling/pandas-profiling <br>\n",
    "Pandas-Profiling is a Python library for exploratory data analysis"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "### Import pandas-profiling (1/3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "# !conda install -c conda-forge pandas-profiling -y\n",
    "import pandas_profiling"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "### Load the admissions table (2/3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "a = pd.read_csv(\"data/ADMISSIONS.csv\")\n",
    "a.columns = map(str.lower, a.columns)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Profile the table (3/3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "# ignore the times when profiling since they are uninteresting\n",
    "cols = [c for c in a.columns if not c.endswith('time')]\n",
    "pandas_profiling.ProfileReport(a[cols])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "**Agenda**\n",
    "* </b>Pandas</b>\n",
    "* </b>Pandas-Profiling</b>\n",
    "* <b>Missingno</b>\n",
    "* </b>Wordcloud</b>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Missingno\n",
    "https://github.com/ResidentMario/missingno <br>\n",
    "Missingno offers a visual summary of the completeness of a dataset. This example brings some intuitive thoughts about `ADMISSIONS` table: \n",
    "* Not every patient is admitted to the emergency department as there are many missing values in `edregtime` and `edouttime`.\n",
    "* `language` data of patients is mendatory field, but it used to be not."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "# !conda install -c conda-forge missingno -y\n",
    "import missingno as msno\n",
    "msno.matrix(a)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "**Agenda**\n",
    "* </b>Pandas</b>\n",
    "* </b>Pandas-Profiling</b>\n",
    "* </b>Missingno</b>\n",
    "* <b>Wordcloud</b>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Wordcloud\n",
    "https://github.com/amueller/word_cloud <br>\n",
    "Wordcloud visualizes a given text in a word-cloud format <br>\n",
    "This example illustrates that majority of patients suffered from sepsis"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Import the Wordcloud package (1/4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "# !conda install -c conda-forge wordcloud -y\n",
    "from wordcloud import WordCloud"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "### Prepare an input text in string (2/4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "text = str(a['diagnosis'].values)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "### Generate a word-cloud from the input text (3/4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "wordcloud = WordCloud().generate(text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Plot the word-cloud (4/4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "plt.figure(figsize = (10,10))\n",
    "plt.imshow(wordcloud, interpolation = 'bilinear')\n",
    "plt.axis(\"off\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Question?\n",
    "Authors: Prof. med. Thomas Ganslandt <Thomas.Ganslandt@medma.uni-heidelberg.de> <br>\n",
    "and Kim Hee <HeeEun.Kim@medma.uni-heidelberg.de> <br>\n",
    "\n",
    "Heinrich-Lanz-Center for Digital Health (HLZ) of the Medical Faculty Mannheim <br>\n",
    "Heidelberg University\n",
    "\n",
    "This is a part of a tutorial prepared for TMF summer school on 03.07.2019"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "dataviz",
   "language": "python",
   "name": "dataviz"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  },
  "nbTranslate": {
   "displayLangs": [
    "*"
   ],
   "hotkey": "alt-t",
   "langInMainMenu": true,
   "sourceLang": "en",
   "targetLang": "fr",
   "useGoogleTranslate": true
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {},
    "version_major": 2,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
